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.»« street:  A  sample  of  programs,  written  in  FORTRAN  by  a  wide  variety 

of  people  for  a  wide  variety  of  applications,  was  chosen  "at 
random"  in  an  attempt  to  discover  quantitatively  "what 
programmers  really  do."  Statistical  results  of  this  survey 
are  presented  here,  together  with  some  of  their  apparent 
implications  for  future  work  in  compiler  design.  The  principal 
conclusion  which  may  be  drawn  is  the  importance  of  a  program 
"profile* "  namely  a  table  of  frequency  counts  which  record  how 
often  each  statement  is  performed  in  a  typical  run;  there  are 
strong  indications  that  profile-keeping  should  become  a  standard 
practice  in  all  computer  systems,  for  casual  users  as  well  as 
system  programmers.  This  paper  is  the  report  of  a  three  month 
study  undertaken  by  the  author  and  about  a  dozen  students  and 
representatives  of  the  softvare  industry  during  the  summer  1970. 
It  is  hoped  that  a  reader  who  studies  this  report  will  obtain 
a  fairly  clear  conception  of  how  FORTRAN  is  being  used,  and 
what  compilers  can  do  about  it. 
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1  .  lid,  reduction 

Designers  of  compilers  and  Instructors  of  computer  science  usually 
have  comparatively  little  Information  about  the  way  in  which  programming 
languages  are  actually  ur.-d  by  typical  programmers .  We  think  we  know  what 
programmers  generally  do,  but  our  notions  are  rarely  based  on  a  representative 
sample  of  the  programs  which  are  actually  being  run  on  computers.  Since 
compiler  writer*  must  prepare  a  system  capable  of  translating  a  language 
in  all  its  generality,  it  is  easy  to  fall  into  the  trap  of  assuming  that 
complicated  constructions  are  the  nonn  when  in  fact  they  are  infrequently 
used.  There  has  been  a  long  history  of  optimising  the  wrong  things,  using 
elaborate  mechanisms  to  product  beautiful  code  in  cases  that  hardly  ever 
arise  in  practice,  while  doing  nothing  about  certain  frequently  occurring 
situations.  For  example,  the  present  author  once  found  groat  significance 
in  the  fact  that  a  certain  complicated  method  was  able  to  translate  the 
statement 

C[lxK+J]  :=  ((A+X)xY)  +  2.768+ ((L-M)x(-K))/Z 
into  only  19  machine  instructions  compared  to  the  21  instructions  obtained 
by  a  previously  published  method  due  to  Caller  et  al.  (See  Knuth  [11].) 

The  fact  that  arithmetic  expressions  usually  have  an  average  length  of  only 
two  operands,  in  practice,  would  have  been  a  great  chock  to  the  author  at 
that  timet 

There  has  been  widespread  realization  that  more  data  about  language 
use  is  needed;  we  can't  really  compare  two  different  compiler  algorithms 


1 


:  •  Wl*  : :  *•  iuiv.t.  lain  iho.y  deal  with*  OX*  course,  the  great 

■:  i :  t  I'.'-iii.;.  is  U»«t  ic  no  rueh  thing  as  a  "typical  programmer”;  there 

> a  ’  ■■ononilouc  variation  among  programs  written  by  different  people 
wi?h  di  fferent  backgrounds  and  sympathies,  and  indeed  there  if.  considerable 
variation  oven  in  <t.i  i’i’orent  programs  written  by  the  same  person.  Therefore 
wo  cannot  trust  any  measurements  to  be  very  accurate,  although  we  can  measure 
the  degree  of  varial  ioi.  in  an  attempt  to  determine  how  significant  it  is. 

Not  ail  properties  ol‘  programs  can  be  reduced  to  simple  statistics;  it  is 
necessary  to  study  selected  programs  in  detail  in  order  to  appreciate  their 
characteristics  more  clearly.  For  a  survey  of  early  work  on  performance 
measurement  and  evaluation,  see  Calingaert  [2]  and  Cerf  [Jl. 

During  the  summer  of  1970,  the  author  worked  together  with  several 
other  people,  in  order  to  explore  the  nature  of  actual  programs  and  the 
corresponding  implications  both  for  software  design  and  for  computer  science 
education.  Members  of  the  group  included  G.  Autrey,  D.  Brown,  I.  Fang, 

D.  Ingalls,  J.  Low,  F.  Maginnis,  M.  Maybury,  D.  McNabb,  E.  Satterthwaite, 

R.  Sites,  R.  Sweet,  and  J.  Walters;  these  people  did  all  of  the  hard  work 
which  led  to  the  results  in  this  report.  Our  results  are  by  no  means  a 
definitive  analysis  of  programming  behavior;  our  goal  was  to  explore  the 
various  possibilities,  as  a  group,  in  order  to  set  the  stage  for  subsequent 
individual  research,  rather  than  to  go  off  in  all  directions  at  once.  Each 
week  the  entire  group  had  an  eight-hour  meeting,  in  order  to  discuss  what 
had  been  learned  during  the  previous  week,  hoping  that  by  combining  our 
differing  points  of  view  we  might  arrive  at  something  reasonably  close  to 
Truth. 


A  first  idea  for  obtaining  "typical”  programs  was  to  go  to  Stanford’s 
Computation  Center  and  rummage  in  the  wastebaskets  and  the  recycling  bins. 
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This  rave  results  but  showed  immediately  what  should  have  been  o'r-vj ous : 
wastebaskets  usually  receive  undebuggnu  pro,;rai:.s.  Fur  f  her. more,  it  soesis 
likely  that  compilers  usually  arc  confronted  with  unaebugged  programs ,  too 
so  it  was  necessary  fer  us  to  choose  v.o'hr-r  wo  wa-.tci  to  study  the 
distributions  of  syntax  errors,  eic.,  or  to  conee!:'  rate  on  working 
programs .  fume  excellent  analyses  of  con,  ion  errors  nave  already  been 
made  ( Freeman  ( •  !  j  boulter.  and  holler  [l;.]),  ar.i  one  of  our  main  goals 
was  to  study  the  effects  of  various  types  of  cpt  hr.i cation;  so  we  decided 
to  restrict  ourselves  to  programs  whic:.  actually  run  to  completion. 

The  wastebasket  method  turned  up  some  interesting  programs,  but  it  va 
r.ot  really  satisfactory .  If  we  wanted  to  automate  the  process,  extensive 
typing  from  the  listings  would  have  beer,  necessary;  so  we  tried  another 
tack.  Our  next  method  of  obtaining  programs  was  to  post  a  man  by  the 
card  reader  at  various  times;  be  would  ask  for  permission  to  copy  decks 
onto  a  special  file.  'ifteen  programs,  totalling  about  5-100  cards,  were 
obtained  in  this  way;  but  the  job  was  very  tune-consuming  since  it  was 
necessary  to  explain  the  objectives  of  cur  project  each  time  and  to  ask 
embarrassing  questions  about  the  status  of  people's  programs. 

The  next  approach  was  to  probe  randomly  among  the  semi-protect ed 
files  stored  on  disks,  looking  for  source  text;  this  was  successful, 
resulting  in  35  programs,  totalling  about  20,000  cards.  We  added  nine 
programs  from  the  CSD  subroutine  library  and  three  programs  from  the 
"Scientific  Subroutine  Package",  and  some  production  programs  from  the 
Stanford  Linear  Accelerator  Center.  A  few  classical  benchmark  programs 
(nuclear  codes,  weather  codes,  ar.d  aerospace  calculations)  wc-re  also 
contributed  by  131-1  representatives,  and  to  top  things  off  we  threw  ir.  some 
programs  of  personal  interest  tv  members  of  the  group. 


"  ’ i .  * 1  'i-  >'t  quite  varied  collection  of  programs :  some 

i-i.  :,o::;e  c:iiu I  !  ;  :mi::c  :Mn>i ic  Heated,  some  crude;  some  important,  some 
t  i.ia.L;  some  tor  production,  some  t’or  play;  some  numerical,  some 
Comb  material. 

it  is  well-known  that  different  programming  languages  evolve  different 
sfvi.es  of  programming,  so  our  study  was  necessarily  language- dependent . 

:'or  example,  one  wou.ld  expect  that  expressions  in  AFL  programs  tend  to 
he  longer  than  in  FORTRAN  programs.  But  virtually  all  of  the  programs 
obtained  hy  our  campling  procedure  were  written  in  FORTRAN  (this  was  the 
first  surprise  of  the  cummer) ,  so  our  main  efforts  were  directed  toward  the 

y  / 

study  of  FORTRAN  programs.-7 

Was  this  sample  representative?  Perhaps  the  users  of  Stanford's 
computers  are  more  sophisticated  than  the  general  programmers  to  be  found 
elsewhere;  after  all  we  have  such  a  splendid  Computer  Science  Department! 

But  it  is  doubt t\il  whether  our  Department  had  any  effect  on  these  programs, 
because  for  one  thing  we  don't  teach  FORTRAN;  it  was  distressing  to  see  what 
little  impact  our  courses  seem  to  be  having,  since  virtually  all  of  the 
programs  we  saw  were  apparently  written  by  people  who  had  learned  programming 
elsewhere.  Furthermore, the  general  style  of  programming  that  we  found 
showed  very  little  evidence  of  "sophistication";  if  it  was  better  than 
average,  the  average  is  too  horrible  to  contemplate!  (This  remark  is  not 
intended  as  an  insult  to  Stanford's  programmers;  after  all  we  were  invading 
their  privacy,  and  they  would  probably  have  written  the  programs  differently 
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By  contacting  known  users  of  ALGOL,  it  was  possible  to  collect  a  fairly 
representative  sample  of  ALGOL  W  programs  as  well.  The  analysis  of 
these  programs  is  still  incomplete;  preliminary  indications  are  that 
the  increased  flexibility  of  data  types  in  ALGOL  W  makes  for  much  more 
variety  in  the  nature  of  inner  loops  than  was  observed  in  FORTRAN,  and 
that  the  improvad  control  structures  make  GO  TO's  and  labels  considerably 
less  frequent.  A  comprehensive  analysis  of  ALGOL  60  programs  has 
recently  been  completed  by  B.  Wichmann  [19] . 

We  analyzed  one  FL/l  program  by  hand.  COBOL  is  not  used  at  Stanford's 
Computation  Center,  and  we  have  no  idea  what  typical  COBOL  programs  are  like. 
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if  they  had  known  the  code  was  to  be  scrutinized  by  sell' -appointed  experts 
like  ourselves.  Our  purposes  were  purely  scientific,  in  an  attempt  to  find 
out  how  things  are,  without  moralizing  or  judging  people's  competence. 

The  point  is  that  the  Stanford  sample  seems  to  be  reasonably  typical  of 
what  might  be  found  elsewhere.)  Another  reason  for  believing  that  our 
sample  was  reasonably  good  is  that  the  programs  varied  from  text-editing 
and  discrete  calculations  to  number-crunching;  they  were  by  no 
means  from  a  homogeneous  class  of  applications.  On  the  other  hand  we  do 
have  some  definite  evidence  of  differences  between  the  Stanford  sample  and 
another  sample  of  over  U00  programs  written  at  Lockheed  (see  Section  2  of 
this  report) . 

The  programs  obtained  by  this  sampling  procedure  were  analyzed  in 
various  ways.  First  we  performed  a  static  analysis,  simply  counting  the 
number  of  occurrences  of  easily  recognizable  syntactic  constructions. 
Statistics  of  this  kind  are  relevant  to  the  speed  of  compilation.  The 
results  of  this  static  analysis  are  presented  in  Section  2.  Secondly,  we 
selected  about  25  of  the  programs  at  random  and  subjected  them  to  a  dynamic 
analysis,  taking  into  account  the  frequency  with  which  each  construct ion 
actually  occurs  during  one  run  of  the  program;  statistics  of  this  kind  are 
presented  in  Section  3.  We  also  considered  the  "inner  loops"  of  IT  programs, 
translating  them  by  hand  into  machine  language  using  various  styles  of 
optimization  in  an  attempt  to  weigh  the  utility  of  various  local  and  global 
optimization  strategies;  results  of  this  study  are  presented  in  Section 
Section  5  of  this  paper  summarizes  the  principal  conclusions  we  reached, 
and  lists  several  areas  which  appear  to  be  promising  for  future  study. 
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examined  a  lui*.  *:•  number  of  FORTRAN  programs  to  see  how  frequently 


certain  const.  ructions  are  vised  In  pi-actice.  Over  230,000  cards 
represent  in  U»»>  programs)  were  analysed  by  Mr.  Maybury  at  the  computer 
renter  of  Lock need  Missiles  and  L’pace  Corporation  in  Sunnyvale. 

fable  1  shows  the  distribution  of  statement  types.  A  "typical 
Lockheed  program"  consists  of  120  comment  cards,  plus  1?8  assignment 
statements,  •  ■  IP's,  3'  GO  TO's,  CALL'S,  21  CONTINUE' a,  l8  WRITE's, 

.1.-''  FORMAT'S.  IV  DO’S,  72  miscellaneous  other  statements,  and  31  continuation 
cards  (t: .doily  involving  COMMON  or  DATA)  .  Essentially  the  same  overall 
distribution  of  statement  types  was  obtained  when  individual  groups  of 
about  :V>  programs  were  tested,  so  these  statistics  tended  to  be  rather 
stable.  We  forgot  to  test  how  many  statements  had  noriblank  labels. 

The  same  test  was  run  on  a  much  smaller  but  still  rather  large 
collection  of  programs  from  our  "Stanford  sample"  (about  11,000  cards) . 
Unfortunately  the  corresponding  percentages  shewn  in  Table  1  do  not  agree 
very  well  with  the  Lockheed  sample;  Stanfordites  definitely  use  more 


assignments  and  less  IP’s  and  GO’S  than  Lockheedians .  A  superficial 
examination  of  the  programs  suggests  that  Lockheed  programmers  are 
perhaps  more  careful  to  check  for  erroneous  conditions  in  their  data. 


Mote  also  that  2.7  times  as  many  comments  appear  on  the  Lockheed  programs, 
indicating  somewhat  more  regimentation.  The  professional  programmers  at 
Lockheed  have  a  distinctly  different  style  from  Stanford’s  casual  coders. 
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Distribution  of  statement  types . 


Lockheed _  _ Stanford 


Assignment 

Number 

731*35 

Percent  * 

»H 

Number 

4879 

Percent  * 

51 

IF 

P70'  7  *  * 

14 . 5  ** 

816  •** 

8.5** 

SOTO 

15 

777 

8 

CALL 

8 

339 

h 

continue 

91'  ‘3 

5 

309 

3 

WRITK 

7795 

li 

508 

5 

FORMAT 

7*35 

J. 

380 

U 

DO 

71.77, 

)i 

!‘57 

5 

DATA 

LLA.8 

tl 

28 

.3 

RETURN 

57;, 9 

2 

186 

2 

DIMENSION 

•jit  92 

•j 

l4l 

1.5 

COMMON 

2908 

1.5 

275 

3 

END 

2579 

1 

121 

l 

BUFFER 

2501 

1 

0 

0 

SUBROUTINE 

2001 

1 

93 

1 

REWIND 

n?.k 

1 

6 

- 

EQUIVALENCE 

1582 

.7 

113 

1 

ENDFILE 

765 

.h 

2 

- 

INTEGER 

757 

.3 

3U 

.3 

READ 

587 

.3 

92 

1 

ENCODE 

585 

.3 

0 

It 

DECODE 

557 

.5 

0 

- 

PRINT 

345 

.2 

5 

- 

ENTRY 

279 

.1 

15 

.2 

STOP 

190 

.1 

11 

.1 

IjOGICAL 

170 

.1 

9 

.1 

REAL 

147 

.1 

3 

- 

IDENT 

106 

.1 

0 

- 

DOUBLE 

3 

- 

99 

1 

OVERLAY 

82 

- 

0 

- 

PAUSE 

57 

- 

6 

.1 

ASSIGN 

57 

- 

k 

- 

PUNCH 

52 

- 

5 

.1 

EXTERNAL 

23 

- 

1 

- 

IMPLICIT 

C 

- 

16 

1-5 

COMPLEX 

6 

- 

0 

- 

NAMELIST 

5 

- 

0 

- 

BLOCKDATA 

1 

- 

O 

- 

INPUT 

0 

- 

0 

- 

OUTFJT 

0 

- 

0 

- 

COMMENT 

5292H 

(28) 

1090 

(11) 

CONTINUATION 

13709 

(7) 

636 

(7) 

*  Percent  of  total  number  of  statements  “excluding  comments  and  continuation 
cards . 

**  The  construction  'IF  (  )  statement*  counts  as  an  IF  as  well  as  a 
statement^  sc  the  total  is  more  than  10045 . 
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k'l'jn:  wore  further  investigated  to  determine  their  length 


n:i  t  h‘|)’ ::  :  nest.  in.-:;  ahon*  ' *-  of  the  DO  statements  used  the  default 

i no rerr. nit  of  .1  .  Most,  po  loops  were  quite  short,  involving  only  one  or 
’wo  statements : 


iVngt h  1 

4 

5 

>  5 

Dumber  ‘‘n4* 

l4»  7 

758 

576 

1045 

1045 

Percent,  •'  > 

l8.b 

9.5 

7 

15 

13 

I’he  depth  of  DO  nectin 

g  was  subject  to 

considerable 

variation;  the  following 

totals  wore  obtained: 

I 

<r 

r-* 

0 

<1 

3 

4 

5 

>  5 

Dumber  4  MU 

1855 

1194 

457 

118 

120 

Percent  55*5 

25 

15 

5.5 

1-5 

1.5 

Of  the  03785  IF  statements  scanned,  8858  (50*)  were  of  the  "old 

style"  IF  (...)  n^ng, 

n^  or 

IF  (...) 

nl,n2 

while  the  other  19925  (70*) 

had  the  form  II  (...)  statement;  14258  (71^0  of  the  latter  were 
"IF  (...)  GO  TO  ".  (These  count  alto  as  GO  TO  statements.)  Only  1307 
of  the  25719  GO  TO  statements  were  computed  (switch)  GO'S. 

An  average  of  about  48  trailing  blank  columns  was  found  per  non-comment 
card.  A  compiler's  lexical  scanner  should  therefore  include  a  high-speed 
skip  over  blanks. 

Assignment  statements  were  analyzed  in  some  detail.  There  were  85504 
assignment  statements  in  all;  and  5^751  (68*)  of  them  were  trivial 


replacements  of  the  form  A  =  B  where  no  arithmetic  operations  are  present 
The  remaining  assignments  included  104l8  of  the  form  A  »  A  op  a  ,  i.e., 
the  first  operand  on  the  right  is  the  same  as  the  variable  on  the  left.  An 

^  In  the  Stanford  sample  the  corresponding  figures  were  2579  out  of  4869 
(49*);  this  was  another  example  of  a  Loe kheed- vs . -Stanford  discrepancy. 


8 


9 


standard 

function 

399^ 


constant 

lt9386 


attempt  was  made  to  rate  the  complexity  of  an  ass  ignmcnt  statement, 
counting  one  point  for  each  +  or  -  sign,  five  for  each  *  ,  and 
8  for  each  j  ;  the  distribution  was 

Complexity  0  1  2  3  l»  3  *•  7  8 

Number  3*731  lM  V,  112)t  10<  ?/r(  2H56  1938  562  2339  532 

Percent  68  17.3  1.3  .1  .3  3  2  .6  3  .6 

Occurrences  of  operators  and  constants  were  also  tallied: 

Operator  +  -  *  /  ** 

Occurrences  17973  10298  1*739  I-.08  90237 

It  is  rather  surprising  to  note  that  7200  (1*0$)  of  the  additions  had  the 
form  Ohl  ;  31*9  (3$)  of  the  multiplications  had  the  form  ct*2  ; 

180  (1*$S)  of  the  divisions  had  the  form  a/2  5  H27  (39$)  of  the 

exponentiations  had  the  form  a**2  .  (We  forgot  to  count  the  fairly 
common  occurrences  of  2*a  ,  2.*a  ,  a*2.  ,  a/2.  ,  2.0*a  ,  etc.) 

The  program  analyzed  indices,  although  it  was  unable  to  distinguish 
subscripted  variables  from  calls  on  programmer-defined  functions.  Of  the 
166,599  appearances  of  variables,  97051  (58$)  were  unindexed,  50979  (50.5$) 
had  one  index,  l6l8l  (9-5$)  had  two,  2008  (1$)  had  three,  and  380  (.2$) 
had  four. 

Another  type  of  '’static"  test  on  the  nature  of  FORTRAN  programs  was 
also  made,  in  an  attempt  to  discover  the  complexity  of  control  flow  in  the 
programs.  John  Cocke's  "interval  reduction"  scheme  (see  [1])  was  applied 
to  fifty  randomly-selected  FORTRAN  programs  and  subroutines,  and  in  every 
caae  the  flow  graph  was  reduced  to  a  single  vertex  after  six  or  less 
transformations.  The  average  number  of  transformations  required  per 
program  was  only  2.75. 

The  obvious  conclusion  to  draw  from  all  these  figures  is  that 

« 

compilers  spend  most  of  their  time  doing  surprisingly  simple  things. 
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•  ■  uavilo  flat  .1st  lus 

I  li«.*  at.  alio  count  a  tubulated  above  are  relevant  to  the  speed  of 
compilation,  but  they  do  not  really  have  a  strong  connection  with  the 
speed  of  object  pro, train  execution.  We  need  to  give  more  weight  to 
statements  that  arc  executed  more  frequently. 

Two  different  approaches  to  dynamic  program  analysis  were  explored  in 
the  course  of  our  study,  the  method  of  frequency  counts  or  program  profiles 
and  the  method  of  program  status  sampling.  The  former  method  inserts 
counters  at  appropriate  places  of  the  program  in  order  to  determine  the 
number  of  times  each  statement  was  actually  performed;  the  latter  method 
makes  use  of  an  independent  system  program  which  interrupts  the  object 
program  periodically  and  notes  where  it  is  currently  executing  instructions. 

Frequency  counts  were  commonly  studied  in  the  early  days  of  computers 
(see  von  Neumann  and  Goldstine  [14]),  and  they  are  now  experiencing  a 
long-overdue  revival.  We  made  use  of  a  program  called  FORDAP,  which  had 
been  previously  developed  in  connection  with  some  research  on  compilation; 
FORDAP  takes  a  FORTRAN  program  as  input,  and  outputs  an  equivalent  program 
which  also  maintains  frequency  counts  and  writes  them  onto  a  file.  When 
the  latter  program  j.s  compiled  and  run,  its  output  will  include  a  listing  of 
the  executable  statements  together  with  their  frequency  counts.  See 
Figure  1,  which  illustrates  the  output  corresponding  to  a  short  program, 
using  an  extension  of  FORDAP  which  includes  a  rough  estimate  of  the  relative 
cost  of  each  statement  (Ingalls  [9])*  The  principles  of  preparing  such 
a  routine  were  independently  developed  at  UCLA  by  S.  Crocker  and  E.  Russell  [15] 
Russell's  efforts  were  primarily  directed  towards  &  study  of  potential 
parallelism  in  programs,  but  he  also  included  sane  serial  analyses  of  large 
scale  routines  which  exhibit  the  same  phenomena  observed  in  our  own  studies. 


Frequency  counts  add  an  important  new  dimension  to  the  FORTRAN 
programs;  indeed,  it  is  difficult  to  express  in  words  just  how  tremendously 
"eye -open inf-:"  they  are!  Even  the  small  example  in  Figure  1  has  a  surprise 
(the  frequency  counts  reveal  that  about  half  the  running  time  is  spent  in 
the  subroutine  linkage  of  the  FUN  function) .  After  studying  dozens  of 
FOKDAPcd  programs,  anl  after  experiencing  the  reactions  of  programmers 
who  see  the  frequency  counts  of  their  own  programs,  our  group  came  to  the  almost 
unanimous  conclusion  that  all  software  systems  should  provide  frequency 
counts  to  all  programmers,  unless  specifically  told  not  to  do  so! 

The  advantages  of  frequency  counts  in  debugging  have  been  exploited 
by  E.  Satterthwaite  [16]  in  his  extensions  to  Stanford's  ALGOL  W 
compiler.  They  can  be  used  to  govern  selective  tracing  and  to  locate 
untested  portions  of  a  program.  Once  the  program  has  been  debugged,  its 
frequency  counts  show  where  the  "bottlenecks"  are,  and  this  information 
often  suggests  improvements  to  the  algorithm  and/or  data  structures. 

For  example,  we  applied  FORDAP  to  itself,  since  it  was  written  in  FORTRAN, 
and  we  immediately  found  that  it  was  spending  about  half  of  its  time  in 
two  loops  that  could  be  greatly  simplified;  this  made  it  possible  to  double 
the  speed  of  FORDAP,  in  less  than  an  hour's  work,  without  even  looking  at 
the  rest  of  the  program.  (See  Example  2  in  Section  4  below.)  The  same 
thing  happened  many  times  with  other  programs. 

Thus  our  experience  has  suggested  that  frequency  counts  are  so 
important  they  deserve  a  special  name;  let  us  call  the  collection  of 
frequency  counts  the  profile  of  a  program. 

Programs  typically  have  a  very  jagged  profile,  with  a  few  sharp  peaks. 

As  a  very  rough  approximation,  it  appears  that  the  n-th  most  important 
statement  of  a  program  from  the  standpoint  of  execution  time  accounts  for 
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about  (a-l)a  n  of  the  running  time,  for  some  a  and  for  small  n  .  We 
also  found  that  less  than  4$  of  a  program  generally  accounts  for  more  thari 
half  of  its  running  time.  This  has  Important  consequences,  since  it  means 

that  programmers  can  make  substantial  improvements  in  their  own  routines  f-’-H'flB 

1 

by  being  careful  in  just  a  few  places;  and  optimizing  compilers  can  be 
made  to  run  much  faster  since  they  need  not  study  the  whole  program  with 
the  same  amount  of  concentration. 

Table  2  shows  how  the  relative  frequency  of  statement  types  changes 
when  the  counts  are  dynamic  instead  of  static;  this  table  was  compiled  from 
the  results  of  24  FORDAP  runs,  with  the  statistics  for  each  program  weighted 
equally.  We  did  not  have  time  to  break  down  these  statistics  further 
(to  discover,  for  example,  the  distribution  of  operators,  etc.),  except 
in  one  respect:  45$  of  the  assignment  statements  were  simply  replacements 
(of  the  form  A  =  B  where  B  is  a  simple  variable  or  constant),  when 
counting  statically,  but  this  dropped  to  55$  when  counting  dynamically. 

In  other  words,  replacements  tend  to  occur  more  often  outside  of  loops 
(in  initialization  sections,  etc.). 


Table  2. 

Distribution 

of  executable 

statements 

Static 

(percent) 

Dynamic 

Assignment 

51 

67 

IF 

10 

11 

GO  TO 

9 

9 

DO 

9 

5 

CALL 

5 

5 

WRITE 

5 

1 

CONTINUE 

4 

7 

RETURN 

4 

5 

READ 

2 

0 

STOP 

l 

0 
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I  he  other  approach  to  dynamic  statistics-gathering,  based  on  program 
otatus  sampling,  tends  to  be  less  precise  but  more  realistic,  in  the  sense 
that.  It  shows  how  much  time  is  actually  spent  in  system  subroutines.  We 
\ised  and  extended  a  routine  called  FROGTIME  [10]  which  was  originally 
developed  by  T.  Y.  Johnston  and  R.  H.  Johnson  to  run  on  System  360 
under  MVT.  PROGTIMli  spawns  the  user  program  as  a  subtask,  then  samples 
its  status  word  at  regular  intervals,  rejecting  the  datum  if  the  program 
was  dormant  since  its  last  interruption.  An  example  of  the  resulting 
"histogram"  output  appears  in  Figure  2;  it  is  possible  (although  not 
especially  convenient)  to  relate  this  to  the  FORTRAN  source  text. 

In  general,  the  results  obtained  from  PROGTIME  runs  were  essentially 
what  we  wouid  have  expected  from  the  FORDAP  produced  profiles,  except  for 
the  influence  of  input/ output  editing  times.  The  results  of  FORDAP  would 
have  led  us  to  believe  that  the  code  between  relative  locations  015928 
and  015A28  in  Figure  2  would  consume  most  of  the  running  time,  but  in 
fact  70$  of  the  time  was  spent  in  those  beloved  system  subroutines 
IHCECOMH  and  IHCFCVTH  (relative  locations  016A88  through  OI9080  ) . 
Roughly  half  of  the  programs  we  studied  involved  substantial  amounts  of 
input/output  editing  time,  and  this  led  us  to  believe  that  considerable 
gains  in  efficiency  would  be  achieved  if  the  compilers  would  do  the  editing 
in-line  wherever  possible.  It  was  easy  to  match  up  the  formats  with  the 
quantities  to  be  edited,  in  every  case  we  looked  at.  However,  we  did  not 
have  time  to  study  the  problem  further  to  investigate  just  how  much  of  an 
improvement  in  performance  could  be  expected  from  in-line  editing.  Clearly 
the  general  problem  of  editing  deserves  further  attention,  since  it  seems 
to  use  up  more  than  25$  of  the  running  time  of  FORTRAN  programs  in  spite 
of  the  extremely  infrequent  occurrence  of  actual  input/ output  statements 
reflected  in  Table  2. 
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Figure  2.  Histogram  corroborating  to  a  PROGTIME  run. 
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'  uo  U)  the  random  nature  of  the  sampling  process,  two  PROGTIMEs 

v'*  v!u>  sa>r)  not  i*J vc  identical  results.  It  is  possible  to 

*  et  accurate  frequency  counts  and  accurate  running  times  by  using  the 
technique  of  "Jump  tracing"  (see  Values  (7>  Chapter  5}).  A  jump  trace 
routine  scans  a  program  down  to  the  next  branch  instruction,  and  executes 
the  intervening  code  at  machine  speed;  when  a  branch  occurs  the  location 


transferred  to  is  written  onto  a  file.  Subsequent  processing  of  the  file 
makes  It  possible  to  infer  the  frequency  counts.  The  jump  trace  approach 
docs  not  require  auxiliary  memory  for  counters,  and  it  can  be  used  with 
arbitrary  machine  language  programs.  Unfortunately  we  did  not  have  time 
to  develop  such  a  routine  for  Stanford’s  computers  during  the  limited  time 
in  which  our  study  was  performed. 


h.  The  Inner  Loops 

We  selected  17  programs  at  random  for  closer  scrutiny;  this  section 
contains  a  summary  of  the  main  features  of  these  programs.  (It  is  worth 
emphasizing  that  we  did  not  modify  the  programs  nor  did  we  discard  programs 
that  did  not  produce  results  in  accordance  with  our  preconceived  ideas; 
we  analyzed  every  routine  we  met  whether  we  liked  it  or  not’.  The  result  is 
hopefully  a  good  indication  of  typical  FORTRAN  programming  practice,  and 
we  believe  that  a  reader  who  scans  these  programs  will  obtain  a  fairly  clear 
conception  of  how  FORTRAN  is  being  used.)  First  the  program  profile  was 
found,  by  running  it  with  FORDAP  and  PROGTIME.  (This  caused  the  chief 
limitation  "n  our  selection,  for  we  were  unable  to  study  programs  for 
which  input  data  was  on  inaccessible  tapes  or  otherwise  unavailable.)  In 
each  case  a  glance  at  the  profile  reduced  the  program  to  a  comparatively 
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small  piece  of  code  which  represented  the  majority  oi'  the  execution  time 
exclusive  ol  input/ output  statements.  These  "inner  loops"  of  the  programs 
are  presented  here;  the  names  of  identifiers  have  been  changed  in  order  to 
ttivo  some  anonymity,  but  no  other  changes  have  been  made. 

In  each  case  wo  hand-translated  the  inner  loop  into  Gystem/260 
machine  language,  using  five  different  styles  of  "optimization" : 

Level  0.  Straight  code  generation  according  to  classical  one-pass 
compilation  techniques . 

Level  1.  Like  level  0  but  using  local  optimizations  based  on  a  good 
knowledge  of  the  machine;  common  subexpressions  were  eliminated 
and  register  contents  were  remembered  across  statements  if  no 
labels  intervene,  etc.,  and  the  index  of  a  DO  was  kept  in  a 
register,  but  no  optimizations  requiring  global  flow  analysis 
were  made. 

Level  2.  "Machine- independent"  optimizations  based  on  global  flow 
analysis,  including  constant  folding,  invariant  expression 
removal,  strength  reduction,  test  replacement,  and  load-store 
motion  (cf.  Allen  [1]). 

Level  $.  Like  level  2  plus  machine-dependent  optimizations  based  on 
the  360,  such  as  the  use  of  BXLE,  LA,  and  the  possibilities 
afforded  by  double  indexing. 

Level  ^ .  The  "best  conceivable"  code  that  would  be  discovered  by  any 
compiler  imaginable.  Anything  goes  here  except  a  change  in  the 
algorithm  or  its  data  structures. 

These  styles  of  optimization  are  not  extremely  well  defined,  but  in 
each  case  we  produced  the  finest  code  we  could  think  of  consistent  with  that 


IT 


U'.t'L.  v  l »i  nearly  wor.v  oast*  this  was  noticeably  better  than  the 

"jit  tsil/.at  Ions  ptVMucf.l  by  the  existing  FORTRAN  compilers;  FORTRAN  H  OFT  02 

wou.ld  presumably  be  able  to  reach  .level  j 5  it’  it  were  carefully  tuned.) 

Level  t  represents  the  ultimate  achievable*  by  comparison  with  what  is 
realised  by  current  techniques,  in  an  attempt  to  assess  whether  or  not 
an  additional  effort  would  be  worthwhile. 

These  styles  of  optimization  can  best  be  appreciated  by  studying 
Example  1  for  which  our  machine  language  coding  appears  in  the  Appendix 
to  this  paper.  It  is  appropriate  to  restrict  our  attention  solely  to  the 
inner  loop,  since  the  profiles  show  that  the  effect  of  optimization  on 
this  small  part  of  the  code  is  very  indicative  of  the  total  effect  of 
optimization  on  the  program  as  a  whole. 

In  order  to  compare  one  strategy  to  another,  we  decided  to  estimate 
the  quality  of  each  program  by  hand  instead  of  actually  running  them  with 
a  timer  as  in  [IS],  We  weighted  the  instructions  in  a  crude  but  not 
atypical  manner  as  follows:  Each  instruction  costs  one  unit,  plus  one  if 
it  fetches  or  stores  an  operand  from  memory  or  if  it  is  a  branch  that  is 
taken,  plus  a  penalty  for  specific  slower  opcodes; 


Floating  add/ subtract,  add  1 


Multiply, 

Divide, 

Multiply  double, 
Shift, 

Load  multiple, 
Store  multiple, 


add  5 
add  S 
add  15 
add  1 

add  ^  n  (n  registers  loaded) 
add  ^  n  (n  registers  stored) 


This  evaluation  corresponds  rougnly  to  1  unit  per  0.7  microseconds  on 
our  model  67  computer.  Other  machine  organizations  ("pipelining”,  etc.) 
would,  of  course,  behave  somewhat  differently,  but  the  above  weights 
should  give  some  insight.  We  also  assumed  the  following  additional  costs 
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for  the  time  spent  in  library  subroutines  (cf.  [8]): 


CQ.RT 

85 

SIN,  COO 

no 

ALOG 

120 

ERF 

130 

Complex  multiply 

''O 

Real  *'«  Integer 

75 

Example  1.  The  first  program  we  studied  involved  lUo  executable  statements, 

but  the  following  five  represented  nearly  half  of  the  running  time: 

DO  2  J  *  1,N 
T  -  ABS(A(I,J)> 

IF  (T-S)  2,2,1 

1  S  =  T 

2  CONTINUE 

Statement  1  was  executed  about  half  as  often  as  the  others  in  the  loop. 

The  programs  in  the  Appendix  have  a  "score”  of 
37.5  ,  28.5  ,  M  ,  8  ,  7 
for  levels  0,  1,  2,  3>  U  respectively. 

The  same  program  also  included  another  time-consuming  loop, 

DO  3  J  =  1,N 

3  A(I,J)  =  A(l,0)*B 

for  which  the  respective  scores  are 

51  ,  29  ,  17  )  12  ,  11  . 

In  this  case  level  0  is  penalized  for  calculating  the  subscript  twice. 

Example  2.  (This  came  from  the  original  FORDAP  program  itself.)  Although 

there  were  1+55  executable  statements,  over  half  of  the  program  time  was 

spent  executing  two  loops  like  this: 

DO  1  J  =  38, 53 

IF  (K(I) .EQ.L(J) )  GO  TO  3 

1  CONTINUE 

2  .... 
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:.<•  !:•.«'  '  ranulnt.  l«ui  give  respective  scores  of 

*  i-  '  •  1  *  ‘  *  •  \>  « 

I '  s  -core  of  ■  in  obtained  in  an  interesting  way  which  applies  to 

several  other  loops  we  had  examined  earlier  in  the  summer;  we  call  it  the 
technique  of  combining  tests.  The  array  element  L(5U)  is  set  equal  to 
v.  1)  ,  so  that  the  loop  involves  only  one  test;  then  after  reaching  L3, 
if  >r  'A  we  go  hack  to  L2.  The  code  is 
v;l  LA  :',8(o, ;••) 

c  i»,0-(o,3) 

BKR  h  (Register  5  contains  A(l3)) 

c 

BN  E  Q.1 

I,;'  . . . 


Tf  accessary)  L(5*0  could  be  restored. 

Of  course,  in  this  particular  case  the  loop  is  executed  only  16  times, 
and  so  it  could  be  completely  unrolled  into  32  instructions 

c 

BKR  y 

c  Ujl(59) 

BKR  3 

•  * 

• 

C  U,l(53) 

BKR  5 

reducing  the  "score"  to  3«  But  in  actual  fact  the  L  table  was  loaded 
in  a  DATA  statement,  and  it  contained  a  list  of  special  character  codes; 
a  more  appropriate  program  would  replace  the  entire  DO  loop  by  a  single 
test  .  , 

IF  (LT(K(I)))  1,3,1 

for  a  suitable  table  LT,  thereby  saving  over  half  the  execution  time  of  the 
program.  (Furthermore,  the  environment  of  the  above  DO  loop  was 
DO  2  I  =  7,72 

so  that  any  assembly  language  programmer  would  have  reduced  the  whole  business 
to  a  single  "translate  and  test".) 
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SxaungJLe  ,v. 

DOUBLE  A,B,D 
DO  1  K  a  1,N 
A  «  T(I-K,1+K) 
B  =  T(l-K,J+K) 
1  D  =  D-A*B 


(This  is  one  of  the  few  times  we  observed  double  precision  being  used,  although 
the  numerical  analysis  professors  in  our  department  strongly  recommend 
against  the  short  precision  operators  of  the  360;  it  serves  as  another 
indication  that  our  department  seems  to  have  little  impact  on  the  users 
of  our  computer’.)  The  scores  for  this  loop  are 
89  ,  67  ,  38  ,  13  ,  12  ; 

here  level  2  suffers  from  some  clumsiness  in  the  indexing  and  a  lack  of 
knowledge  that  an  ME  instruction  could  be  used  instead  of  MD. 


Example  4.  Here  the  inner  loop  is  longer  and  involves  a  subroutine 

call.  The  following  code  accounted  for  10%  of  the  running  time;  the  entire 

program  had  214  executable  statements. 

DO  1  K  =  M,20 
CALL  RAND(R) 

IF  (R  .GT.  .81)  N(K)  =  1 
1  CONTINUE 
•  •  •  «  •  • 

SUBROUTINE  RAND(R) 

J  =  1*65539 

IF  (J)  1,2,2 

1  J  =  J+2147U83647+1 

2  R  *  J 

R  -  R*.46566l3E-9 
I  *  J 
K  *  K+l 
RETURN 
END 


(Here  we  have  a  notoriously  bad  random  number  generator,  which  the  programmer 
must  have  gotten  out  of  an  obsolete  reference  book;  it  is  another  example 
of  our  failure  to  educate  the  eommunlby.)  Conversion  from  integer  to  real 
is  assumed  to  be  done  by  the  sequence 
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for  suitable  confer.!.::  of  CTTX:  and  GPMC1.  By  further  adjusting  these 
constants  the  multiplication  by  .1u'5»'61«e-9  «  2~:>1  could  be  avoided; 
but  this  observation  was  felt  to  be  beyond  the  scope  of  level  U  optimization, 
although  it  would  occur  naturally  to  any  programmer  using  assembly  language. 


The  most  interesting  thing  here,  however,  is  the  effect  of  subroutine 
linkage,  since  the  long  prologue  and  epilogue  significantly  increases  the 
time  of  the  inner  loop.  The  timings  for  levels  0-5  assume  standard  OS 
subroutine  conventions,  although  levels  2  and  5  are  able  to  shorten  the 
prologue  and  epilogue  somewhat  because  of  their  knowledge  of  program  flow. 

For  level'1*,  the  subroutine  was  ’’opened",  placed  in  the  loop  without  any 
linkage;  hence  the  sequence  of  scores, 

110. 9  ,  105-1  >  8l.fc  ,  76.2  ,  27.2 

Without  subscripting  there  is  comparatively  little  difference  between 
levels  0  and  5;  this  implies  that  optimization  probably  has  more  payoff 
for  FORTRAN  than  we  would  find  for  languages  with  more  flexible  data  structures . 

It  would  be  interesting  to  know  just  how  many  hours  each  day  are  spent 
in  prologues  and  epilogues  establishing  linkage  conventions. 


Example  5.  The  next  inner  loop  is  representative  of  several  programs 

which  had  to  be  seen  to  be  believed. 

DO  1  K  =  1,N 

M  (j-l)*10+K-l 

IF  (M.EQ.O)  M  =  1001 

Cl  =  C 1+A1(M) * ( Bl** ( K-l) ) * ( B2** ( J-l) ) 

C2  =  C2M2(M)*(B1*-*(K-1))*(B2**(J-1)) 

IF  ((K-l) .EQ.O)  T  =  0.0 

IF  ((K-l).GE.l)  T  =  A1 (M) * ( K-l) *(B1** ( K-2) ) * ( B2** ( J-l) ) 

C5  =  C5+T 

IF  ((K-l) .EQ.0)  T  =  0.0 

IF  ((K-l).GE.l)  T  =  A2 (M) * ( K-l) * ( Bl** ( K-2 ) ) * ( B2#* ( J-l) ) 

CU  =  cU-t-T 

IF  ((J-l).EQ.O)  T  =  0.0 

IF  ((J-I).GE.I)  T  =  Al(M)*(Bl^(K-l))*(J-l)*(B2**(j-2)) 

C5  *  C5+T 


IT  ((j-l).Kl.o)  T  =  0.0 

IF  ((j-l).GE.l)  T  =  A2(M)*(Bl**(K-l))*(j-l)»(B2»*(.T-2)) 

O  C«>T 
1  CONTINUE 

After  stai'ing  at  this  for  several  minutes,  our  group  decided  it  did  not 
deserve  to  be  optimized.  But  after  two  weeks’  rest  we  looked  at  it  again 
and  found  interesting  applications  of  "strength  reduction",  both  for  the 
exponentiations  and  for  the  conversion  of  K  to  real.  (The  latter  applies 
only  in  level  which  knows  that  K  doesn't  get  too  large.)  The  scores 
were 

w  >  >  159 ,  1U5  ,  ion  . 

Level  1  optimization  finds  common  subexpressions,  and  level  2  finds  the 
reductions  in  strength.  Level  It  removes  nearly  all  the  IF  tests  and 
rearranges  the  code  so  that  Cl  and  C2  are  updated  last;  thus  only 
B1**(K-1)  is  necessary,  not  both  it  and  Bl**(K-2)  . 


Example  6.  In  this  case  the  "inner  loop"  involves  subroutine  calls 
instead  of  a  DO  loop: 


SUBROUTINE  S(A,B,X)  9 

DIMENSION  A(2),B(2)  9 

X  »  0  9 

Y  =  (B(2)-A(2))*12+B(l)-A(l)  9 

IF  (Y.LT.O)  GO  TO  1  9 

X  *  Y  5 

1  RETURN  9 

END  9 

SUBROUTINE  W(A,B,C,D,X)  b 

DIMENSION  A(2),B(2),C(2),D(2),U(2),V(2)  b 
X  =  0  14- 

CALL  S(A,D,X)  b 

IF  (X.EQ.O)  GO  TO  5  b 

CALL  S(C,B,X)  2 

IF  (X.EQ.O)  GO  TO  5  2 

CALL  S(C, A,X)  1 

U(l)  =  A(l)  .1 

U(2)  =  A(2)  1 

IF  (X.NE.O)  GO  TO  1  1 

U(l)  =  C(l)  0 

U(2)  =  C(2)  0 

1  CONTINUE  1 
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call  a(B,n,x)  i 

v(D  n(i)  i 

VO;)  n(2)  i 

IF  (X.NK.O)  C,0  TO  2  1 

V(l)  »  D(l)  0 

V(2)  =  D(L')  0 

0  CALL  S(U,V,X)  1 

•5  CONTINUE  4 

RETURN  4 

end  4 


The  numbers  at  the  right  of  this  code  show  the  approximate  relative 
frequency  of  occurrence  of  each  statement;  calls  on  this  subroutine 
accounted  for  (O^  cf  the  execution  time  of  the  program.  The  scores  for 
various  optimization  styles  are 

1545.5  ,  1037.5  ,  755.3  ,  736.3  ,  289  . 

Here  270  of  the  1545. 5  units  for  level  0  are  due  to  repeated  conversions 
of  the  constant  0  from  integer  to  real.  Levels  2  and  3  move  the  first 
statement  "X  =  0"  out  of  the  main  loop,  performing  it  only  if  "Y.LT.O"  . 

The  big  improvement  in  level  4  comes  from  inserting  the  code  for  subroutine 
S  in  line  and  making  the  corresponding  simplifications.  Statements  like 
U(l)  =  A(l)  ,  U(2)  =  A(2)  become  simply  a  change  in  base  register. 

Perhaps  further  reductions  would  be  possible  if  the  context  of  subroutine  W 
wer  examined,  since  if  we  denote  12*A(l)+A(2)  by  a  ,  12*B(1)+B(2)  by  b  , 

etc.,  the  subroutine  computes  max(0,  min(b,d)-max(a,c))  . 


Example  7.  In  this  program  virtually  all  of  the  time  exclusive  of 
input/output  editing  was  spent  in  the  two  loops 
DO  1  I  =  1,N 

A  =  X**2+Y**2-2.*X*Y*C(l) 

B  «  SQRT(A) 

K  =  100.*B+1.5 
1  D(I)  =  S(l)*T(K) 

Q  =  D(l) -D(N) 

DO  2  I  =  2,M,2 
Q  -  Q+  4 . *D ( I) +2 . *D ( I+l) 
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where  array  D  was  not  used  tubsequently .  The  scores  are 

744  ,  ;.B7  ,  ;'1C  ,  2<)2  ,  2'jh  . 

Here  level  1  computes  by  "MKR  0,0"  instead  of  a  subroutine  call, 

and  it  computes  -2. X0(:H l)  by  "AER  0,0"  instead  of  multiplying.  Level  4 
combines  the  two  DO  loops  into  one  and  eliminates  array  D  entirely. 

(Such  savings  in  storage  space  were  present  in  quite  a  few  programs  we 
looked  at;  some  matrices  could  be  reduced  to  vectors,  and  some  vectors 
could  be  reduced  to  scalars,  due  to  the  nature  of  the  calculations. 

A  quantitative  estimate  of  how  much  space  could  be  saved  by  such  optimization 
would  be  interesting.) 


Example  8.  Ninety  percent  of  the  running  time  of  this  program  was  spent 

in  the  following  subroutine. 

SUBROUTINE  COMPUTE 
COMMON  .... 

COMPLEX  Y(1Q),Z(10) 

R  =  REAL(Y(N)) 

P  -  SIN(R) 

Q  =  COS(R) 

S  »  C*6.*(P/3.-W*P) 

T  =  1.4l42l4*p*p*Q*C*6. 

U  =  T/2. 

V  =  -2.*C*6.*(P/3.-Q*Q*P/2.) 

Z(l)  =  (0., -l.)*(S*Y(l)+T*Y(2) ) 

Z(2)  =  (0., -l.)*(U*Y(l;+V*Y(2)) 

RETURN 

END 

This  was  the  only  example  of  complex  arithmetic  that  we  observed  in  our 
study.  The  scores 

841.5  ,  735-5  >  336  ,  336  ,  249 

reflect  the  fact  that  levels  0  and  1  make  six  calls  on  the  complex-multiply 
subroutine,  while  levels  2  and  3  expand  complex  multiplication  into  a 
sequence  of  real  operations  (with  obvious  simplifications) .  Level  4  in 
this  analysis  makes  free  use  of  the  distributive  law,  e.g. 
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S  -  •J'!'"  .  *0.  'Q)  ,  although  tills  may  not  be  numerically  justified. 

Furthermore  level  4  assumes  the  existence  of  a  single  "SINCOS(R)" 
subroutine  that  computes  both  the  sine  and  cosine  of  its  argument  in 
1-  5  units  of  time;  programmers  who  calculate  the  sine  of  an  angle  usually 
want  to  know  its  cosine  too  and  vice  versa,  and  it  is  possible  to  calculate 
both  in  somewhat  less  time  than  would  be  required  to  compute  them 
individually. 


Example  9.  A  program  with  245  executable  statements  spent  70  percent  of 

its  time  in 

DO  2  K  .=  1,M 
DO  2  J  =  1,M 
X  =  0. 

Y  =  0. 

DO  1  I  =  1,M 
N  =  (J+J+(I-1)*M2) 

B  -  A(K,I) 

X  =  X+B*Z(N) 

1  Y  -  Y+3*Z(N-1) 

DY(L)  =  W*X 
DY(L+1)  =  -W*Y 

2  L  =  L+2 


when  M  was  only  5 .  Scores  (for  the  innermost  I  loop  only)  are 
84  ,  69  ,  30  ,  24  ,  2b  , 

reflecting  the  fact  that  level  4  cannot  do  anything  for  this  case. 


Example  10.  In  this  excerpt  from  a  contour  plot  routine,  the  CALL  is  only 
done  rarely: 

DO  1  I  =  L,M 

1  IF  (X(l-l,j)  .LT.Q  .AND  X(l,j)  .GE.  Q)  CALL  S(A1,A2,A3,  A4,7,A5) 

The  scores,  assuming  that  X(l) .LT.Q  about  half  the  time,  are 
40  ,  31.5  ,  14.5  ,  7-5  ,  5  . 

Level  3  keeps  Q  in  a  register,  while  level  2  does  not.  Level  4  is 
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especially  interesting  since  it  avouds  testing  X(I-1,J)  .LT.Cl  in 
those  cases  where  it  Js  known  to  be  true  from  the  previous  loop.  We 
had  noticed  similar  situations  in  other  routines. 


Example  11.  This  "fast  Fourier  transform"  example  shows  that  inner 

loops  aren't  always  signalled  by  the  word  "DO". 

1  K  =  K+l 

A1  =  A(K)*C(j)+Al 
B1  =  B(K)*C(j)+Bl 
K  =  K+l 

AS  =  A(K)*S(j)+A2 
B2  =  B(K)*S(j)+B2 
J  =  J+I 

IF  (J.GT.M)  J  «  J-M 
IF  (K.LT.M)  GO  TO  1 


The  scores  are 

118  ,  91  ,  Co  ,  54  ,  50  ; 

level  4  1g  able  to  omit  the  second  "K  =  K+l"  ,  and  to  use  a  BXLE  for  "J=J+I". 


Example  12.  Unfortunately  an  inner  loop  is  not  always  as  short  as  we  had 

hoped.  This  rather  long  program  (1300  executable  statements)  spent  about 

half  of  its  time  in  the  following  rather  horrible  loop. 

DO  3  I  =  1,M 
JO  =  J1 

IF  (JO.EQ.O)  JO  *  J2 
J1  =  Jl+1 
J5  =  J3+1 
j4  *  J4+1 

IF  (JU.EQ.(L(J-1)+1))  J4  =  1 
J5  *  Jl+1 

IF  (J5.EQ,. (J2+1) )  J5  -  1 
U1  =  U(J1,K1,K2) 

VI  =  V(J1,K1,K2) 

W1  =  W(J1,K1,K2) 

P(J1)  =  .25*(Q1(I)*(V1+V(J3,K3,K2))*(W1+W(J3,K3,K2)) 

+Q2  (I)  *(  V1+ V(  J3+1*  K3,  K2) )  *(W1+W  ( J3+1,  K3,  K2) ) 

-Q3  (I)  *(  Vl+V( j4,  K*,  K2)  )  *(W1+W(  j4,  K4,  K2) ) 

+D*(  (Ul+U(  J5#  Kl,  K2)  )  *(W1+W(  J5,  Kl,  K2)  ) 

-  (U1+U(J0,  Kl,  K2)  )  *(W1+W(  JO,  Kl,  K2)  )  )  ) 

+R1(J1,  Kl)  *R2(  K8) *( S ( Jl,  K2+ 1)  #  (Wl+W(  Jl,  Kl,  K2+1)  ) 

-S  ( Jl,  K2)  *  (Wl+W(  Jl,  Kl,  K2-1)  )  ) 
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:  !■■  (,  i .  it; .  .1)  ;o  to  i 

j4  -.1. 

!!•’  («!■  .rXl.o)  ,1*  L(J-l) 

«’( 'I  0  l’(.M-)  -  'Q4 ( l)  y  (Vl+V(jt ■,  K4 ,  K2)  )*(W1+W( Jto,  &4,  K2) ) 

;o  to 

!  IT'  (M.ai.l)  0,0  TO  0 

P(.ri)  P(.T1)  +  .:V, «q)i  ( I)  v(  V1+  V(  J3-1,  K3,  K2) ) *(W1+W(  J3-1, K3,  K2)  ) 

00  TO  • 

I‘(  Jl)  P(.T1)  +  .23yQ4 (l)  *(  Vl+V(  J2+4,  K5,  K2)  )  *(W1+W(  J2+4,  K3>  K2)  ) 

CONTINUE 


Here  levels  and  3  have  Just  enough  registers  to  maintain  all  the 
necessary  indices;  the  scores  are 

79;-  >  :''>3  ,  242  ,  258  ,  20?  . 

Level  4  observes  that  Jo  can  more  easily  be  computed  by  "J6  =  j4"  before  j4 
is  changed;  and  the  Q,4(i)  terms  are  included  as  if  they  were  conditional 
expressions  within  the  big  formula  for  P(J1)  . 


Example  13 .  Here  is  a  standard  "binary  search"  loop. 

I  =  0 
K  N+l 

1  J  =  (l+K)/2 

IF  (J.EQ.I)  GO  TO  5 
TF  (X(J)-XO)  2,4,3 

2  I  -  J 
GO  TO  1 

3  K  -  J 
GO  TO  1 

4  ... 


The  scores 

33  |  ,  33  )  27  ,  21  ,  10 

for  the  inner  loop  are  of  interest  primarily  because  level  4  was  able  to 
beat  level  3  by  a  larger  factor  than  in  any  other  example  (except  where 
subroutines  were  expanded  in-line) .  The  coding  for  level  4  in  this  case 
consisted  of  six  packets  of  eight  lines  each,  one  for  each  permutation  of 
the  three  registers  ct  ,  3  ,  7  : 
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LI* *(3/  I A  7>‘>(  1,3) 

r.RL  7,1 

NR  7,3 
f-K  7 .  * 

BE 

CM  0,X(>) 
liL  LI73  i 
BK  l'J'7 
Lla/3  . . . 


Here  4l,  4j,  1; K  are  respectively  assumed  to  be  in  registers  a  ,  7  ,  3  ; 
register  8  contains  -4  .  Division  by  2  can  be  reduced  tc  a  shift  since 
it  is  possibl°  to  prove  that  I  ,  J  ,  K  are  nonnegative.  Half  of  the 
"CR  7, Of;  BE  I/yl"  could  have  been  removed  if  X(0)  were  somehow  set 
to  •  this  would  save  another  10^. 

Actually  th.e  binary  search  was  not  the  inner  loop  in  the  program  we 
analyzed,  although  the  programmer  (one  of  our  group)  had  originally  thought 
it  would  be'.  The  frequency  counts  showed  that  his  program  was  actually 
spending  most  of  its  time  moving  entries  in  the  X  table,  to  keep  it  in  order 
when  new  elements  were  inserted.  This  was  one  of  many  cases  we  observed 
where  a  knowledge  of  frequency  counts  immediately  suggested  vital  improvements, 
by  directing  the  programmer's  attention  to  the  real  bottlenecks  in  his 
program.  Changing  to  a  hash-coding  scheme  made  this  particular  program 
run  about  twice  as  fast. 


Examples  lU-17 .  From  this  point  on  the  programs  we  looked  at  began  to 

seem  rather  repetitious.  We  worked  out  four  more  examples,  summarized 

here  with  their  scores. 

DO  1  I  1,N 
C  =  c/d*r 
R  »  R+l. 

1  D  =  D-l.  [45  ,  42  ,  27  ,  ?1  ,  20] 

DO  1  J  =  I,N 

H(I,J)  =  H(I,J)+S(I)*S(J)/D1-S(K+I)*S(K+J)/D2 
1  H(J,I)  =  H(I,J) 

\1%  ,  103  ,  58  ,  49  ,  41.5] 
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REAL  FICTION  F(X) 

V  -•  >:>. 70710-3 
IF  CY.LT .0.0)  GO  TO  1 
v  ■  0.5*(1.0fERF(Y))  /  . 

RETURN  J  low  frequency 

1  ?  =  1 .0-0 . 5*  ( 1 .0  *-ERF(  -Y) ) 

RETURN 

END 

[219.5  ,  208.5  ,  191.3  ,  191.3  >  151] 

DO  1  I  =  1,N 

1  A  -  A+B(I)+C(K,I)  [Ul  ,  31  ,  1U  ,  9  ,  8] 

(The  latter  example  is  the  loop  from  015928  to  015A28  in  Figure  2.) 

Cursory  examination  of  other  programs  led  us  to  believe  that  the  above 
seventeen  examples  are  fairly  representative  of  the  programs  now  being 
written  in  FORTRAN,  and  that  they  indicate  the  approximate  effects 
achievable  with  different  styles  of  optimization  (on  our  computer).  Only 
one  of  the  other  programs  we  looked  at  showed  essentially  different 
characteristics,  and  this  one  was  truly  remarkable;  it  contained  over  700 
lines  of  straight  calculation  (see  the  excerpts  in  Figure  3)  involving 
no  loops,  IF's  or  GO'S  !  This  must  be  some  sort  of  record  for  the  length 
of  program  text  without  intervening  labeled  statements,  and  we  did  not 
believe  it  could  possibly  be  considered  typical. 

All  but  one  of  the  DO  loops  in  the  above  examples  apparently  have 
variable  bounds,  but  in  fact  the  compiler  could  deduce  that  the  bounds  are 
actially  constant  in  most  cases.  For  instance  in  Example  17,  N  is  set 
equal  to  S05  at  the  beginning  of  the  program  and  never  changed  thereafter. 

Table  3  summarizes  the  score  ratios  obtained  in  three  examples; 

0/1  denotes  the  ratio  of  the  score  for  level  0  to  the  score  for  level  1, 
etc . 

It  may  be  objected  that  measurement  of  the  effects  of  optimization 
is  Impossible  since  programmers  tend  to  change  the  style  of  their  FORTRAN 
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J23  =  -E  S 12  T  *  S  E  TN  ♦  ES12B*$EBN 
U24  =  -ES22T*SETfs  «  ES  2  2  C*S  E  BN 
UiC  *  EStdT^Srn  4  h  S6  6  P*  SEEN 
U  31  -  - ES66  T  ♦  S  C Tiii  4  ES66Q  *  S  E  0  N 

V3T  =-2 . * (  (  E  El  1  T  •*■  N*  C  S  1 2  T  )  *  s  XT  4  (  p*  ES22  T  ♦  ES  12  T  )  *S  YT  )  *C2XC  2Y 
1  -2.*DSCKI  (i-O*  (T*ES66T* S2XS2Y 

V3H  *-2.M  <  Etll  P4f*U>12E)*SX:34<  w*ES22c)4fcS  12E  )  *S  Yd)  *02X0  2Y 
1  -  2  .  *0  SOP  T  (  M  )  *  U)*EStt8*S2X32Y 

V4T  =-d.»((fc$lir4M4(S12T)*SXT4<M*ES22T4ES12T)*5YT)*C4XC4Y 
1  -E.*0SQKT  (M)*TT*ES66T*S4XS4Y 

V4H  -  -8  .*(<  CSLlt?4K*ES12D)*SXB*-(  M*E  S  2  2B*  E  S  12B  )  *  S  YU  )  *C  4  *C  4Y 
1  -fl.*0SCRT  (M  ■OT  e*FS66e*S4XS4Y 

V5T  --?.«(<<;. *ESUT«**ES12T  )  *S  X  T+  {  M*E  S2  2  T+9.*F  S12T)  *  SYT  )  *C2  XC6  Y 
1  -6„*U iCE I (M)*TT* ESo6T*i2Xi6Y 


A  (  3  )  =  -All*ML2*2.*Xl  1  -  4.  *A  22*ML2»4.»  XI  2  -  41  3*  ML2*2  .  *X  II 
1  +Tl*fcA.*xl3 

c  -  TML  2  C*  <  A  l  l  ♦  “A22  ♦  A13) 

3  -4  .♦t-Y*SB82*Ml  2 

4(a)  =  -ML2* ( ^.*XI 1* ( XI2+XI3) -BETA*X ll/LSC  >-All*02S 

1  -TML  2  0*  (  X  I  1/4.  -»XI2  ♦  X(  3) 

2  ♦  (HX*  <  K  1  l):*M*K12  BX  •  *SAl 1  4M*HY  *  (H »K  22  e*K  1  2rt  Y  ) +SB  l  l 

i  ♦HXY«m*k*6H*SC  11) /2. 

4  +Y3 

A  (  5 )  -  -ML  2*  X 1 2 i«  k  0.  -  A22* C2 S* 16  . 

1  -  TPL20*2.«X  13 

2  4Y4 


0.(14,16)  =  4Y1315 
E'  (  14,17)-  *  Y  l  i  1 1 
e  ( 15 ,14  )  =  2(14,15) 

0(  It, 14)-  B (  14,  It) 

0(17,  14)=  e(l4,  17) 

0(15,1)  =  0. 

0(15,2)  =  0. 

3(13,3)  =  —4  .  *M12  *1- Y 

0(15,15)  =  -HY*rtY*M  SJ*D  lid /(  2. *08)  *■  Y1414 
0(15,16)=  Y  l  4  1  5 
8(15,17)=  YUlfc 
0(16,15)=  e<15,  16) 

0(17,15)=  d  <15 ,17) 

0  (  16,  1  )=  4ML  2  *H  X  Y 
d  ( 1 1  ,2  )  =  0. 

B (  It, 3)  =  0. 

8(16,16)  *  -0XY*HXY*M/ <4.*C£6B)  4  Y1515 


Figure  5.  Excerpts  from  a  remarkable  program. 


264. 

265. 

266. 
267. 
263. 
P'S'). 
2  7  C. 
271  . 
2  72. 
27  3. 

274. 

275. 

276. 

277. 


604  . 

fcC3. 
606  . 
607. 
6C*3. 
609. 
6  L  C. 
fell. 
612. 
bl  3. 

614. 

615. 


559. 
960  . 
561  . 
962. 
963  . 
964. 
965  . 

560. 
967  . 
565. 
96  5. 

970. 

971. 
972  . 

973. 

974. 
S75. 
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alii* 


Kxtvut  ion  spool  ratios  with  various  types  of  optimisation. 


Ox&mplo 

v'l 

o/o 

o/7 

0/4 

1/4 

2/4 

3/4 

la 

**  •  i 

;  -7 

Cj*4 

4.1 

2.0 

1.1 

;u> 

i..- 

•  .  >> 

t  ’ 

4.8 

4.8 

2.7 

1.5 

1.1 

1.0 

.!  •  * 

!, .  I 

t  •  •  o 

5.4 

2.6  ■ 

1.4 

i.  •• 

>  .. 

• 

*  .3 

7.t 

5.6 

3*2 

1.1 

it 

i.i 

i.1. 

1.0 

4.4 

3.9 

3.0 

2.8 

. .  • 

1.0 

o.4 

13.1 

5.2 

1-5 

1.4 

• 

1* 

0.0 

0.1 

5.4 

3.6 

2.6 

2.5 

i 

i.  > 

0.4 

0.5 

2.o 

1.5 

1.2 

1.1 

s 

i.i 

0.5 

2.5 

3.4 

3.0 

1.3 

1.3 

.) 

1.0 

o.a 

3*5 

3.5 

2.9 

1.3 

1.0 

10 

lo 

0.8 

5-0 

3.0 

6.5 

2.9 

1-5 

11 

1.? 

0.0 

0.0 

2.4 

1.8 

1.2 

1.1 

10 

...  • 

.7.3 

3.8 

1.8 

1.1 

1.1 

l: 

1.0 

i.i 

1.3 

3.9 

3.3 

2.7 

2.1 

it 

1.1 

i.«. 

2.1 

0.3 

2.1 

1.4 

1.1 

l^ 

1 .  8 

o  v 

2.8 

3.3 

2.5 

1.4 

1.1 

i-' 

1.1 

1.1 

1.1 

1.5 

1.4 

1.3 

1.3 

17 

1.7 

O.o 

4 . 

5.1 

3.9 

1.8 

1.1 

programs  when  they  know  w!)ut  kind  of  optimisations  aro  being  done  for 
tnem.  However,  the  programs  we  examined  showed  no  evidence  that  the 
programmers  h.ad  any  idea  what  the  compiler  does,  except  perhaps  the 
knowledge  that  "l”  is  or  is  not  converted  to  ”1.0"  at  compile  time 
when  appropriate.  Therefore  we  expect  that  such  feedback  effects  are 
very  limited. 

Note  that  level  3  and  level  k  programs  ran  k  or  more  times  as  fast 
as  level  0  programs,  in  about  half  of  the  cases.  Level  3  was  not  too  far 
from  level  k  except  in  Examples  k  and  6  where  short  subroutine  code  was 
expanded  in  line;  by  incorporating  this  technique  and  the  idea  of 
replicating  short  loops,  level  3  would  come  very  close  indeed  to  the 
"ultimate"  performance  of  level  k  optimization.  (Before  conducting  this 
study,  the  author  had  expected  a  much  greater  difference  between  levels 
3  and  k  and  had  been  experimenting  with  some  more  elaborate  schemes  for 
optimization,  capable  of  coming  close  to  the  level  k  code  in  the  binary 
search  example  above.  But  the  sample  programs  seem  to  show  that  existing 
optimization  techniques  are  good  enough,  on  our  computer  at  least.) 

Summary  and  Conclusions 

Compiler  writers  should  be  familiar  with  the  nature  of  programs 
their  compiler  will  have  to  handle.  Besides  constructing  "best  cases"  and 
"worst  cases"  it  is  a  good  idea  to  have  some  conception  of  "average 
cases".  We  hope  that  the  data  presented  in  this  paper  will  help  to  give 
a  reasonably  balanced  impression  of  the  programs  actually  being  written 
today . 

Of  course  every  individual  program  is  atypical  in  some  sense,  yet 
our  study  showed  that  a  small  number  of  basic  patterns  account  for  most 
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i  < ••  l  *  :  ,.*t !.t •  a  vi  ■ 

’  :.nn  !c  v.  •  allied  wi  t 
•'f  I'OKDgA;!  eompU evs 
i  '.iv  sample  may 
»■>!'  hue  wo  rid  wi.l.L  oi 


";lv‘  me*  i o:n:  i:i  ncu.  Perhaps  these  programs  can  be 
wa !. is!  ie  eompnrison  ot“  compiler  and  machine  speeds 
i.  t  :*.«•  '\;AMM  teat.”  [!'/'!•  See  also  ]•’.  Bryant's  comparison 
summnri  zed  in  pi.  Appendix  pp.  'P'h-'fGj  ] . 
not  be  correct,  and  so  wo  hope  people  in  other  parte 
ndue:  similar  experiments  in  order  to  see  if  independent 


at-* ali.ee  ioid  comparable  results. 

Whi le  gathering  tiiose  statistics  we  became  convinced  that  a  comparatively 
simple  t:..u nge  to  tiie  present,  method  ot'  program  preparation  can  make 
substantial  improvements  in  tnc  efficiency  of  computer  usage.  The  program 
profiles  (i.e. .  collections  of  frequency  counts)  which  we  used  in  our 
analyses  turned  out  to  be  so  helpful  that  we  believe  profiles  should  be 
made  available  routinely  to  all  programmers  by  all  of  the  principal 
software  systems. 

The  "ideal  system  of  the  future”  will  keep  profiles  associated  with 
source  programs,  using  the  frequency  counts  in  virtually  all  phases  of  a 
program’s  life.  During  the  debugging  stage,  the  profiles  can  be  quite 
useful,  o.  -.  for  selective  tracing;  statements  with  zero  frequency 
indicate  untested  sections  of  the  program.  After  the  program  has  been 
debugged  it  may  already  have  served  its  purpose,  but  if  it  is  to  be  a 
frequently  used  program  the  high  counts  in  its  profile  often  suggest 
basic  improvements  that  can  be  made.  An  optimizing  compiler  can  also  make 
very  effective  use  of  the  profile,  since  it  often  suffices  to  do  time 
consuming  optimization  on  only  one  tenth  or  one  twentieth  of  a  program. 

The  profile  can  also  be  used  effectively  in  storage  management  schemes. 

in  early  days  of  computing,  machine  time  wav  king,  and  people  worked 
hard  io  vet  extremely  efficient  programs.  Eventually  machines  got  larger 
and  faster,  and  the  payoff  for  writing  fast  programs  was  measured  in  minutes 


of  seconds  instead  ol‘  liours.  Moreover,  in  considering  the  total  cost  of 
computing,  people  begun  to  observe  that  program  development  and  maintenance 
costs  often  overshadowed  the  actual  cost  of  running  the  programs.  Therefore 
most  of  the  emphasis  in  software  development  has  been  in  making  programs 
easier  to  write,  easier  to  understand,  and  easier  to  change.  There  is  no 
doubt  that  this  emphasis  has  reduced  total  system  cost?:  in  many  installations 
but  there  is  also  little  doubt  that  the  corresponding  lack  of  emphasis  on 
efficient  code  has  resulted  in  systems  which  can  be  greatly  improved,  and 
it  seems  to  be  time  to  right  the  balance.  Frequency  counts  give  an 
important  dimension  to  programs,  showing  programmers  how  to  make  their 
routines  more  efficient  with  comparatively  little  effort.  A  recent  study 
['>]  showed  that  this  approach  led  to  an  eleven-fold  increase  in  a  particular 
compiler's  speed.  It  appears  useful  to  develop  interactive  systems  which 
tell  the  programmer  the  most  costly  parts  of  his  program,  and  which  give 
him  positive  reinforcement  for  his  improvements  so  that  he  might  actually 
enjoy  making  the  changes'.  For  most  of  the  examples  studied  in  Section  4 
we  found  that  it  was  possible  for  a  programmer  to  obtain  noticeably  better 
performance  by  making  straightforward  modifications  to  the  inner  loop  of 
his  FORTRAN  source  language  program. 

In  the  above  remarks  we  have  implicitly  assumed  that  the  design  of 
compilers  should  be  strongly  influenced  by  what  programmers  want  to  do. 

An  alternate  point  of  view  is  that  programmers  should  be  strongly  influenced 
by  what  their  compilers  do;  a  compiler  writer  in  his  infinite  wisdom  may 
in  fact  know  what  is  really  good  for  the  programmer,  and  would  like  to  steer 
him  towards  a  proper  course.  This  viewpoint  has  some  merit,  although  it  has 
often  been  carried  to  extremes  in  which  programmers  have  to  work  harder  and 
make  unnatural  constructions  just  so  the  compiler  writer  has  an  easier  job. 


are  supplied  Lo  a  programmer,  it  will 


Wr  i  . r.i 


:  C 


fear 


'■;a:  which  aspects  ot'  a  laiijU-iage  the  implementor 
efficiently:  the  reporting  of  this  information 
wa..  to  exert  a  positive  influence  on  the  users  of 


a  lar.  ;:i  v. 

V:.o  !v«u It  a  of  our  study  suggest  several  avenues  for  further  research. 

:  or  example,  add 3  *.  Lonal  static  and  dynamic  statistics  should  be  gathered 
whicr.  are  more  meaningful  with  respect  to  local  optimizations.  A  more 
sophisticated  study  of  these  statistics  would  also  be  desirable. 

Our  survey  seems  to  have  given  a  reasonably  clear  picture  of  FORTRAN 
as  it  is  'now  used,  other  lan.nio.gcs  should  be  studied  in  a  similar  way,  so 
that  software  designers  can  conceptualise  the  notion  of  "typical"  programs 
in  SOBOL,.  ’J. 'SOL,  PL./l,  LISP,  API.,  SNOBOL,  etc. 

V/o  found  that  well-done  optimisation  leads  to  at  least  a  4-or  5-fold 
increase  in  program  speed  (exclusive  of  input/output  editing)  over  straight 
translation,  in  about  half  of  the  programs  we  analyzed.  This  figure  is 
based  on  a  computer  such  as  the  Xo/67  at  Stanford,  and  it  may  prove  to 
be  somewhat  different  on  other  configurations;  it  would  be  interesting 
ho  see  how  much  different  the  results  would  be  if  the  seventeen  examples 
were  worked  out  carefully  for  other  types  of  computers.  Furthermore, 
a  study  of  the  performance  gain  which  would  be  achieved  by  in-line  format 
odi tine  is  definitely  called  for. 

Ac  we  discussed  the  example  programs  we  saw  many  occasions  where  it 
is  natural  for  compiler  optimization  to  be  done  interactively.  The  programmer 
could  perhaps  be  asked  in  Example  11  whether  or  not  J  will  be  nonnegative 
and  less  than  throughout  the  loop  (so  that  J  =  J+I  can  be  done 

with  a  ’’load  address"  instruction) ;  in  Example  S  he  might  be  asked  whether 


the  distributive  law  could  be  used  on  his  formulas;  in  Example  7  he 
might  be  asked  if  X**2+Y*-*2  can  ever  overflow  (if  not,  this  calculation 
may  be  taken  out  of  the  loop);  and  so  on. 


As  ihe  reader  can  see,  there  is  considerable  work  yet  to  be  done  on 
empirical  studies  of  programming,  much  more  tnan  we  could  achieve  in  one 


summer . 
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Appendix.  Examples  of  hand  translation 

\ 

;  The  following 

code  was  produced  from 

1 

3 

i  DO  2  J  = 

1,!.' 

|  T  =  AES(A( I, J) ) 

IF  (T-S) 

2, 2, 1 

'  1  S  =  T 

i  2  CONTIIil'E 

j 

. 

using  the  various 

styles  of  hand  translation  described  in  Section  1.  Only 

the  inner  loop  is 

shown,  not  the  initialization. 

Level  0 . 

Cost 

j 

qi  si 

5,  J  2 

L 

T  t 
•'> u 

24 

2,  =A(  AROWS)  7 

A 

3,1  2 

SLL 

3,2  2 

, 

LE 

0,A(5)  2 

• 

LPER 

0,0  1 

STD 

0,T  2 

LS 

0,T  2 

SE 

0,S  3 

3im 

L2  1.5 

3 

Ll  2  x  .  5 

Ll  LE 

0,T  2x.5 

J 

STL 

C.S  2xo 

4 

;  L2  L 

5j  J  2 

A 

5,  =?'  1'  2 

! 

C 

3,::  2 

j  BNH 

Q1  2 

. 

A  "dedicated"  use 

of  registers,  and  a  straightforward  statement-by-statement 

1 

1  approach,  are  typical  of  level  0. 

} 

" 

j  Level  1 . 

- 

1 

01  ST 

5,  J  2 

LA 

3,  AROWS  1 

MR 

2, 5  6 

A 

3,1  2 

SLL 

3,2  2 

LE 

0,A(i)  d 

LFSR 

n  {*  *1 

STE 

0,T  2 

i 

*-;r: 

i 


i.'j 

x  .  'j 
x  ,'j 

1 


o  *  be 


VOmoVUl  s 
present  : 
vo  ‘is It*!* 


uvo.  .  d'  and  .’K,  the  knowledge  of  register  contents,  and  the 
>f  the  redundant  branch.  The  redundant  LE  in  location  LI  is  still 
ecnuse  the  occurrence  of  a  label  potentially  destroys  the 

contents. 


>  1 

i.K 

0,0(0,;’) 

o 

Lm 

L.PKR 

0,0 

1 

I.KR 

:.,o 

1 

SKR 

•»,n 

2 

BHH 

L? 

1.5 

1.1 

LKR 

0,0 

lx  .5 

I..’ 

A 

5, -A(AROWCxt) 

2 

C 

;,SPFC 

2 

BHH 

01 

2 

Here  OPEO  contains  the  precomputed  address  of  A(I,N)  ;  S  is  maintained 

in  floating  rericter 


Level 


o  i  Li-: 

o, 0(0,5) 

2 

LPER 

0,0 

1 

CER 

o,:: 

1 

Bmra 

o 

1.5 

Ll  LER 

;i,o 

lx  .5 

L  '  EE  LE 

5,  ‘‘,Q.l 

2 

f 

ctev  is 

preloaded  with 

the  address  of  L2  (for  a  microscopic 

Improvement) ,  and  registers  U  and  5  are  preloaded  with  appropriate  values 

.ioverhiir;  the  3XLE. 
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Level  It. 


Q1 

LE 

0,0(0, 3) 

2x  .5 

LPER 

0,0 

lx  *5 

CER 

0,2 

1 X  .5 

BNHR 

2 

1*5 x  .5 

Ll.l 

LER 

2,0 

1  x  .25 

L2.1 

LE 

o,Mo,  3) 

2  X  .5 

LPER 

0,0 

lx  .5 

CER 

0,2 

lx  .5 

BNHR 

6 

3.5x  -5 

Ll.l 

LER 

2,0 

lx  .25 

LI. 2 

BXLK 

3Aqi 

2x  .5 

Since  the  loop  program  is  so  short  it  has  been  duplicated,  saving  half  of 
the  BXLE’s,  wnen  proper  initialization  and  termination  routines  are 
appended.  (The  code  would  have  been  written 


Q1 

LE 

0,0(0, 3) 

LPER 

0,0 

CER 

0,2 

BKR 

2 

L2.1 

LE 

o,Mo,3) 

LPER 

0,0 

CER 

0,2 

BHR 

6 

L2.2 

BXLE 

3AQ1 

Ll.l 

... 

LER 

2,0 

B 

L2.1 

LI. 2 

LER 

2,0 

B 

L2.2 

if  the  frequency  counts  of  this  program  would  have  given  less  weight  to 
statement  1.) 

Note  that  the  FORTRAN  convention  of  storing  arrays  by  columns  would 
make  these  loops  rather  inefficient  in  a  paging  environment;  a  compiler 
should  make  appropriate  changes  to  the  storage  mapping  function  for  arrays 


in  such  a  case 
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